library(rtemis) .:rtemis 1.0.0 🌊 aarch64-apple-darwin20
library(data.table)rtemis Home | Documentation | GitHub
library(rtemis) .:rtemis 1.0.0 🌊 aarch64-apple-darwin20
library(data.table)dat <- read("../Data/data.xlsx")inspect(dat)<data.table> 500 x 7
Lab: <chr> Lab E, Lab I, Lab E, Lab E...
Organism: <chr> No Significant Growth, No growth, Normal flora, Candida spp....
Sex: <chr> Male, Female, Female, Female...
Department: <chr> Out Patient Department, Pediatric ward, Out Patient Department, Gynaecology ward...
Year: <nmr> 2021.00, 2023.00, 2021.00, 2021.00...
Specimentype: <chr> Urine, Blood, Stool, Cervical swab...
Hospitalized48hrs: <chr> No, No, No, No...
check_data(dat) dat: A data.table with 500 rows and 7 columns.
Data types
* 1 numeric feature
* 0 integer features
* 0 factors
* 6 character features
* 0 date features
Issues
* 0 constant features
* 118 duplicate cases
* 0 missing values
Recommendations
* Consider converting character features to factors or excluding them.
* Consider removing the duplicate cases.
Create a Preprocessor object:
prp <- preprocess(
dat,
config = setup_Preprocessor(character2factor = TRUE, remove_duplicates = TRUE)
)Get the preprocessed data:
datp <- preprocessed(prp)Re-check data:
check_data(datp) datp: A data.table with 382 rows and 7 columns.
Data types
* 1 numeric feature
* 0 integer features
* 6 factors, of which 0 are ordered
* 0 character features
* 0 date features
Issues
* 0 constant features
* 0 duplicate cases
* 0 missing values
Recommendations
* Everything looks good
We train 4 models using different algorithms, but the same outer resampling folds.
Please note that we are doing minimal tuning to reduce demo runtime.
hospitalized48_glmnet <- train(
datp,
algorithm = "glmnet",
outer_resampling_config = setup_Resampler(seed = 650)
)<Resampled Classification Model>
GLMNET (Elastic Net)
⚙ Tuned using exhaustive grid search.
⟳ Tested using 10 independent folds.
<Resampled Classification Training Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.906 (3e-03)
Specificity: 0.629 (0.033)
Balanced_Accuracy: 0.767 (0.017)
PPV: 0.716 (0.018)
NPV: 0.866 (0.008)
F1: 0.800 (0.012)
Accuracy: 0.770 (0.017)
AUC: 0.850 (0.008)
Brier_Score: 0.172 (0.006)
<Resampled Classification Test Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.881 (0.061)
Specificity: 0.580 (0.122)
Balanced_Accuracy: 0.731 (0.075)
PPV: 0.688 (0.065)
NPV: 0.823 (0.102)
F1: 0.771 (0.056)
Accuracy: 0.733 (0.075)
AUC: 0.787 (0.067)
Brier_Score: 0.191 (0.025)
Plot ROC curve:
plot_roc(
hospitalized48_glmnet,
main = "GLMNET"
)Plot variable importance:
plot_varimp(
hospitalized48_glmnet,
show_top = 11L
)hospitalized48_cart <- train(
datp,
algorithm = "cart",
outer_resampling_config = setup_Resampler(seed = 650)
)<Resampled Classification Model>
CART (Classification and Regression Trees)
⟳ Tested using 10 independent folds.
<Resampled Classification Training Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.944 (0.020)
Specificity: 0.781 (0.032)
Balanced_Accuracy: 0.863 (0.010)
PPV: 0.817 (0.020)
NPV: 0.932 (0.019)
F1: 0.876 (0.008)
Accuracy: 0.864 (0.010)
AUC: 0.903 (0.020)
Brier_Score: 0.106 (0.007)
<Resampled Classification Test Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.824 (0.063)
Specificity: 0.659 (0.123)
Balanced_Accuracy: 0.742 (0.072)
PPV: 0.720 (0.076)
NPV: 0.784 (0.069)
F1: 0.767 (0.060)
Accuracy: 0.744 (0.071)
AUC: 0.761 (0.089)
Brier_Score: 0.204 (0.055)
plot_roc(
hospitalized48_cart,
main = "CART"
)plot_varimp(
hospitalized48_cart
)hospitalized48_lightrf <- train(
datp,
algorithm = "lightrf",
outer_resampling_config = setup_Resampler(seed = 650)
)<Resampled Classification Model>
LightRF (LightGBM Random Forest)
⟳ Tested using 10 independent folds.
<Resampled Classification Training Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.805 (0.012)
Specificity: 0.665 (0.023)
Balanced_Accuracy: 0.735 (0.015)
PPV: 0.713 (0.015)
NPV: 0.767 (0.015)
F1: 0.756 (0.012)
Accuracy: 0.736 (0.015)
AUC: 0.783 (0.006)
Brier_Score: 0.220 (1.3e-03)
<Resampled Classification Test Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.783 (0.097)
Specificity: 0.644 (0.105)
Balanced_Accuracy: 0.714 (0.065)
PPV: 0.698 (0.066)
NPV: 0.750 (0.097)
F1: 0.735 (0.060)
Accuracy: 0.714 (0.065)
AUC: 0.732 (0.061)
Brier_Score: 0.227 (0.008)
plot_roc(
hospitalized48_lightrf,
main = "LightRF"
)plot_varimp(
hospitalized48_lightrf
)hospitalized48_lightgbm <- train(
datp,
hyperparameters = setup_LightGBM(
learning_rate = c(0.001, 0.01)
),
outer_resampling_config = setup_Resampler(seed = 650)
)<Resampled Classification Model>
LightGBM (Gradient Boosting)
⚙ Tuned using exhaustive grid search.
⟳ Tested using 10 independent folds.
<Resampled Classification Training Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.832 (0.023)
Specificity: 0.741 (0.020)
Balanced_Accuracy: 0.787 (0.017)
PPV: 0.768 (0.015)
NPV: 0.811 (0.023)
F1: 0.799 (0.017)
Accuracy: 0.787 (0.017)
AUC: 0.871 (0.018)
Brier_Score: 0.155 (0.016)
<Resampled Classification Test Metrics>
Showing mean (sd) across resamples.
Sensitivity: 0.778 (0.086)
Specificity: 0.660 (0.077)
Balanced_Accuracy: 0.719 (0.061)
PPV: 0.704 (0.055)
NPV: 0.747 (0.084)
F1: 0.737 (0.060)
Accuracy: 0.720 (0.061)
AUC: 0.792 (0.064)
Brier_Score: 0.188 (0.031)
plot_roc(
hospitalized48_lightgbm,
main = "LIghtGBM"
)plot_varimp(
hospitalized48_lightgbm
)present(
list(
hospitalized48_glmnet,
hospitalized48_cart,
hospitalized48_lightrf,
hospitalized48_lightgbm
)
)Elastic Net (GLMNET), Classification and Regression Trees (CART), LightGBM Random Forest (LightRF), and Gradient Boosting (LightGBM) were used for Classification.
The top-performing model was CART with a test-set Balanced Accuracy of 0.742, followed by GLMNET, LightGBM, and LightRF with Balanced_Accuracy of 0.731, 0.719, and 0.714 respectively.